2,096 research outputs found
Algebra and Matrix Normed Spaces
We begin by looking at why operator spaces are necessary in the study of operator algebras and many examples of and ways to construct operator algebras. Then we examine how certain basic algebraic relationships break down when norms are placed on them. This leads to ways to correct these ideas using matrix norms
Automatic assessment of English learner pronunciation using discriminative classifiers
This paper presents a novel system for automatic assessment of pronunciation quality of English learner speech, based on deep neural network (DNN) features and phoneme specific discriminative classifiers. DNNs trained on a large corpus of native and non-native learner speech are used to extract phoneme posterior probabilities. A part of the corpus includes per phone teacher annotations, which allows training of two Gaussian Mixture Models (GMM), representing correct pronunciations and typical error patterns. The likelihood ratio is then obtained for each observed phone. Several models were evaluated on a large corpus of English-learning students, with a variety of skill levels, and aged 13 upwards. The cross-correlation of the best system and average human annotator reference scores is 0.72, with miss and false alarm rate around 19%. Automatic assessment is 81.6% correct with a high degree of confidence. The new approach significantly outperforms spectral distance based baseline systems
Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation
This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition. Our work focuses on transcription of multi-genre broadcast media, which is often only categorised broadly in terms of high level genres such as sports, news, documentary, etc. However, in terms of acoustic modelling these categories are coarse. Instead, it is expected that a mixture of latent domains can better represent the complex and diverse behaviours within a TV show, and therefore lead to better and more robust performance. We propose a new method, whereby these latent domains are discovered with Latent Dirichlet Allocation, in an unsupervised manner. These are used to adapt DNNs using the Unique Binary Code (UBIC) representation for the LDA domains. Experiments conducted on a set of BBC TV broadcasts, with more than 2,000 shows for training and 47 shows for testing, show that the use of LDA-UBIC DNNs reduces the error up to 13% relative compared to the baseline hybrid DNN models
Using phone features to improve dialogue state tracking generalisation to unseen states
The generalisation of dialogue state tracking
to unseen dialogue states can be very
challenging. In a slot-based dialogue system,
dialogue states lie in discrete space
where distances between states cannot be
computed. Therefore, the model parameters
to track states unseen in the training
data can only be estimated from more general
statistics, under the assumption that
every dialogue state will have the same underlying
state tracking behaviour. However,
this assumption is not valid. For example,
two values, whose associated concepts
have different ASR accuracy, may
have different state tracking performance.
Therefore, if the ASR performance of the
concepts related to each value can be estimated,
such estimates can be used as general
features. The features will help to relate
unseen dialogue states to states seen
in the training data with similar ASR performance.
Furthermore, if two phonetically
similar concepts have similar ASR
performance, the features extracted from
the phonetic structure of the concepts can
be used to improve generalisation. In
this paper, ASR and phonetic structurerelated
features are used to improve the
dialogue state tracking generalisation to
unseen states of an environmental control
system developed for dysarthric speakers
Combining feature and model-based adaptation of RNNLMs for multi-genre broadcast speech recognition
Recurrent neural network language models (RNNLMs) have consistently outperformed n-gram language models when used in automatic speech recognition (ASR). This is because RNNLMs provide robust parameter estimation through the use of a continuous-space representation of words, and can generally model longer context dependencies than n-grams. The adaptation of RNNLMs to new domains remains an active research area and the two main approaches are: feature-based adaptation, where the input to the RNNLM is augmented with auxiliary features; and model-based adaptation, which includes model fine-tuning and introduction of adaptation layer(s) in the network. This paper explores the properties of both types of adaptation on multi-genre broadcast speech recognition. Two hybrid adaptation techniques are proposed, namely the finetuning of feature-based RNNLMs and the use of a feature-based adaptation layer. A method for the semi-supervised adaptation of RNNLMs, using topic model-based genre classification, is also presented and investigated. The gains obtained with RNNLM adaptation on a system trained on 700h. of speech are consistent using both RNNLMs trained on a small (10Mwords) and large set (660M words), with 10% perplexity and 2% word error rate improvements on a 28:3h. test set
- …